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Abstract 


This  paper  studies  a  situation  is  which  correct  knowledge  is  harmful  to  a  problem 
solver  even  given  unlimited  computational  resources.  A  knowledge  base  is  defined  to  be 
sociopathic  if  all  the  tuples  in  the  knowledge  base  are  individually  judged  to  be  correct 
and  a  subset  of  the  knowledge  base  gives  better  performance  than  the  original  knowledge 
base  independent  of  the  amount  of  computational  resources  that  are  available.  Almost  all 
knowledge  bases  that  contain  probabilistic  rules  are  shown  to  be  sociopathic  and  so  this 
problem  is  very  widespread. 

Sociopathicity  has  important  consequences  for  rule  induction  methods  and  rule  set 
debugging  methods.  Sociopathic  knowledge  bases  cannot  be  properly  debugged  using  the 
widespread  practice  of  incremental  modification  and  deletion  of  rules  responsible  for  wrong 
conclusions  a  la  Teiresias;  this  approach  fails  to  converge  to  an  optimal  solution.  The 
problem  of  optimally  debugging  sociopathic  knowledge  bases  is  modeled  as  a  bipartite  graph 
minimization  problem  and  shown  to  be  NP-hard.  Our  heuristic  solution  approach  is  called 
the  Sociopathic  Reduction  Algorithm  and  experimental  results  verify  its  efficacy. 
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1      Introduction 

Reasoning  under  uncertainty  has  been  widely  investigated  in  artificial  intelligence.  Prob- 
abilistic approaches  are  of  particular  relevance  to  rule-based  expert  systems,  where  one  is 
interested  in  modeling  the  heuristic  and  evidential  reasoning  of  experts.  Methods  devel- 
oped to  represent  and  draw  inferences  under  uncertainty  include  the  certainty  factors  used 
in  Mycin  (Buchanan  and  Shortliffe,  1984),  fuzzy  set  theory  (Zadeh,  1979),  and  the  belief 
functions  of  Dempster-Shafer  theory  (Shafer,  1976)  (Gordon  and  Shortliffe,  1985).  In  many 
expert  system  frameworks,  such  as  Emycin,  Expert,  MRS,  S.l,  and  Kee,  the  rule  structure 
permits  a  conclusion  to  be  drawn  with  varying  degrees  of  certainty  or  belief.  This  paper 
addresses  a  concern  common  to  all  these  methods  and  systems. 

In  refining  and  debugging  a  probabilistic  rule  set,  there  are  three  major  causes  of 
errors:  missing  rules,  wrong  rules,  and  deleterious  interactions  between  good  rules.  The 
purpose  of  this  paper  is  to  explicate  a  type  of  deleterious  interaction  and  to  show  that  it  (a) 
is  indigenous  to  rule  sets  for  reasoning  under  uncertainty,  (b)  is  of  a  fundamentally  different 
nature  from  missing  and  wrong  rules,  (c)  cannot  be  handled  by  traditional  methods  for 
correcting  wrong  and  missing  rules,  and  (d)  can  be  handled  by  the  method  described  in  this 
paper. 

In  section  2,  we  describe  the  type  of  deleterious  rule  interactions  that  we  have  en- 
countered in  connection  with  automatic  induction  of  rule  sets,  and  explain  why  the  use  of 
most  rule  modification  methods  fails  to  grasp  the  nature  of  the  problem.  In  section  3,  we 
discuss  approaches  to  debugging  and  refining  rule  sets  and  explain  why  traditional  rule  set 
debugging  methods  are  inadequate  for  handling  global  interactions.  In  section  4,  we  for- 
mulate the  problem  of  reducing  deleterious  interactions  as  a  bipartite  graph  minimization 
problem  and  show  that  it  is  NP-hard.  In  section  5,  we  present  a  heuristic  method  called 
the  Sociopathic  Reduction  Algorithm.  Finally,  our  experiences  in  using  the  Sociopathic 
Reduction  Algorithm  are  described. 

A  brief  description  of  terminology  will  be  helpful  to  the  reader.  Assume  there  exists 
a  collection  of  training  instances,  each  represented  as  a  set  of  feature- value  pairs  of  evidence 
and  a  set  of  hypotheses. 


Rules  are  in  Horn  clause  form:  conclude(H,  CF)  :-  E  ,  where  E  is  a  conjunction  of 
evidence,  H  is  a  hypothesis,  and  CF  is  a  certainty  factor  or  its  equivalent. 

A  rule  that  correctly  confirms  a  hypothesis  generates  true  positive  evidence;  one  that 
correctly  disconfirms  a  hypothesis  generates  true  negative  evidence.  A  rule  that  incorrectly 
confirms  a  hypothesis  generates  false  positive  evidence;  one  that  incorrectly  disconfirms  a 
hypothesis  generates  false  negative  evidence.  False  positive  and  false  negative  evidence  can 
lead  to  misdiagnoses  of  training  instances. 


2      Inexact  Reasoning  and  Rule  Interactions 

When  operating  as  an  evidence-gathering  system  (Buchanan  and  Shortliffe,  1984),  an  ex- 
pert system  accumulates  evidence  for  and  against  competing  hypotheses.  Each  rule  whose 
preconditions  match  the  gathered  data  contributes  either  positively  or  negatively  toward 
one  or  more  hypotheses.  Unavoidably,  the  preconditions  of  probabilistic  rules  succeed  on 
instances  where  the  rule  will  be  contributing  false  positive  or  false  negative  evidence  for 
conclusions.  For  example,  consider  the  following  rule: 

conclude(klebsiella,  0.77)  :-  (Rl) 

finding(surgery,  yes), 
flnding(gram_negJnfection,  yes) 

The  frequency  with  which  Rl  generates  false  positive  evidence  has  a  major  influence 
on  its  CF  of  0.77,  where  —1  <  CF  <  1.  Indeed,  given  a  representative  set  of  training 
instances,  such  as  a  library  of  medical  cases,  the  certainty  factor  of  a  rule  can  be  given 
a  probabilistic  interpretation1  as  a  function  G(xi,X2,X3),  where  X\  is  the  fraction  of  the 
positive  instances  of  a  hypothesis  where  the  rule  premise  succeeds,  thus  contributing  true 
positive  or  false  negative  evidence;  x-i  is  the  fraction  of  the  negative  instances  of  a  hypothesis 
where  the  rule  premise  succeeds,  thus  contributing  false  positive  or  true  negative  evidence; 


See  Appendix  1  for  a  description  of  the  function  G.  The  calculations  of  G  give  a  purely  statistical  inter- 
pretation to  CFs,  and  hence  do  not  incorporate  orthogonal  utility  measures  as  was  done  in  MYCIN(Buchanan 
and  Shortliffe,  1984). 


and  Z3  is  the  ratio  of  positive  instances  of  a  hypothesis  to  all  instances  in  the  training 
set.  For  Rl  in  our  domain,  (7(.43,  .10,  .22)  =  0.77  by  the  formulas  in  Appendix  A,  because 
statistics  on  104  training  instances  yield  the  following  values: 

X\  :     E  true  among  positive  instances      =     10/23 

X2  :     E  true  among  negative  instances     =     8/81  (1) 

X3  :     H  true  among  all  instances  =     23/104 

Hence,  Rl  generates  false  positive  evidence  on  eight  instances,  some  of  which  may 
lead  to  false  negative  diagnoses.  But  whether  they  do  or  not  depends  on  the  other  rules 
in  the  system;  hence  our  emphasis  on  taking  a  global  perspective.  The  usual  method  of 
dealing  with  situations  such  as  this  is  to  make  the  rule  fail  less  often  by  specializing  its 
premise  (Michalski  et  al.,  1983).  For  example,  surgery  could  be  specialized  to  neurosurgery, 
and  we  could  replace  Rl  with: 

conclude(klebsiella,  0.92)  :-  (R2) 

finding(neurosurgery,  yes), 
finding(gram_negJnfection,  yes) 

On  our  case  library  of  training  instances  for  the  R2  rule,  G(.26,  .02,  .22)  =  0.92,  so  R2 
makes  erroneous  inferences  in  two  instances  instead  of  eight.  Nevertheless,  modifying  Rl 
to  be  R2  on  the  grounds  that  Rl  contributes  to  a  misdiagnosis  is  not  always  appropriate; 
we  offer  three  objections  to  this  frequent  practice.  First,  both  rules  are  inexact  rules  that 
offer  advice  in  the  face  of  limited  information,  and  their  relative  accuracy  and  correctness  is 
explicitly  represented  by  their  respective  CFs.  We  expect  them  to  fail,  hence  failure  should 
not  necessarily  lead  to  their  modification.  Second,  all  probabilistic  rules  reflect  a  trade-off 
between  generality  and  specificity.  An  overly  general  rule  provides  too  little  discriminatory 
power,  and  a  overly  specific  rule  contributes  too  infrequently  to  problem  solving.  A  policy 
on  proper  grain  size  is  explicitly  or  implicitly  built  into  rule  induction  programs;  this  policy 
should  be  followed  as  much  as  possible.  Specialization  produces  a  rule  that  usually  violates 
such  a  policy.  Third,  if  the  underlying  problem  for  an  incorrect  diagnosis  is  rule  interactions, 
a  more  specialized  rule,  such  as  the  specialization  of  Rl  to  R2,  can  be  viewed  as  creating 
a  potentially  more  dangerous  rule.   Although  it  only  makes  an  incorrect  inference  in  two 


instead  of  eight  instances,  these  two  instances  will  be  now  harder  to  counteract  when  they 
contribute  to  misdiagnoses  because  R2  is  stronger.  Note  that  a  rule  with  a  large  CF  is  more 
likely  to  have  its  erroneous  conclusions  lead  to  misdiagnoses.  This  perspective  motivates  the 
prevention  of  misdiagnoses  in  ways  other  than  the  use  of  rule  specialization  or  generalization. 

Besides  rule  modification,  another  common  method  of  nullifying  the  incorrect  infer- 
ence of  a  rule  in  an  evidence-gathering  system  is  to  introduce  counteracting  rules.  In  our 
example,  these  would  be  rules  with  a  negative  CF  that  concludes  Klebsiella  on  the  false 
positive  training  instances  that  lead  to  misdiagnoses.  But  since  these  new  rules  are  prob- 
abilistic, they  will  introduce  false  negatives  on  some  other  training  instances,  and  these 
may  lead  to  misdiagnoses.  We  could  add  yet  more  counteracting  rules  with  a  positive  CF 
to  nullify  any  problems  caused  by  the  original  counteracting  rules,  but  these  additional 
rules  introduce  false  positives  on  yet  other  training  instances,  and  these  may  lead  to  other 
misdiagnoses.  Also,  a  counteracting  rule  is  often  of  less  quality  in  comparison  to  rules  in 
the  original  rule  set;  if  it  were  otherwise  the  induction  program  would  have  included  the 
counteracting  rule  in  the  original  rule  set.  Clearly,  adding  counteracting  rules  may  not  be 
necessarily  the  best  way  of  dealing  with  misdiagnoses  made  by  probabilistic  rules. 


3      Debugging  Rule  Sets  and  Rule  Interactions 

Assume  we  are  given  a  set  of  probabilistic  rules  that  were  either  automatically  induced  from 
a  set  of  training  cases  or  created  manually  by  an  expert  and  knowledge  engineer.  In  refining 
and  debugging  this  probabilistic  rule  set,  there  are  three  major  causes  of  errors:  missing 
rules,  wrong  rules,  and  unexpected  interactions  among  good  rules.  We  first  describe  types  of 
rule  interactions,  and  then  show  how  the  traditional  approach  to  debugging  is  inadequate. 

3.1      Types  of  rule  interactions 

In  a  rule-based  system,  there  are  many  types  of  rule  interactions.  Rules  interact  by  chaining 
together,  by  using  the  same  evidence  for  different  conclusions,  and  by  drawing  the  same 
conclusions  from  different  collections  of  evidence.  Thus  one  of  the  lessons  learned  from 
research  on  MYCIN  was  that  complete  modularity  of  rules  is  not  possible  to  achieve  when 


rules  are  written  manually  (Buchanan  and  Shortliffe,  1984).  An  expert  uses  other  rules  in  a 
set  of  closely  interacting  rules  in  order  to  define  a  new  rule,  in  particular  to  set  a  CF  value 
relative  to  the  CFs  of  interacting  rules. 

Automatic  rule  induction  systems  encounter  the  same  problems.  Moreover,  automatic 
systems  lack  an  understanding  of  the  strong  semantic  relationships  among  concepts  to  allow 
judgments  about  the  relative  strengths  of  evidential  support.  Instead,  induction  systems 
use  biases  to  guide  the  rule  search  (Michalski  et  al.,  1983).  The  rule  sets  that  are  later 
analyzed  for  sociopathicity  in  this  paper  were  generated  by  the  induction  subsystem  of 
ODYSSEUS.  The  inductive  biases  used  in  this  system  are  rule  generality,  whereby  a  rule 
must  cover  a  certain  percentage  of  instances;  rule  specificity,  whereby  a  rule  must  be  above 
a  minimum  discrimination  threshold;  rule  colinearity,  whereby  rules  must  not  be  too  similar 
in  classification  of  the  instances  in  the  training  set;  and  rule  simplicity,  whereby  a  maximum 
bound  is  placed  on  the  number  of  conjunctions  and  disjunctions  (Wilkins,  1987). 

3.2      Traditional  methods  of  debugging  a  rule  set 

The  standard  approach  to  debugging  a  rule  set  consists  of  iteratively  performing  the  fol- 
lowing steps: 

•  Step  1.  Run  the  system  on  cases  until  a  false  diagnosis  is  made. 

•  Step  2.  Track  down  the  error  and  correct  it,  using  one  of  five  methods  pioneered  by 
Teiresias  (Davis,  1982)  and  used  by  knowledge  engineers  generally: 

—  Method  1:  Make  the  preconditions  of  the  offending  rules  more  specific  or  some- 
times more  general.2 

—  Method  2:  Make  the  conclusions  of  offending  rules  more  general  or  sometimes 
more  specific. 

—  Method  3:  Delete  offending  rules. 

—  Method  4:  Add  new  rules  that  counteract  the  effects  of  offending  rules. 


Ways  of  generalizing  and  specializing  rules  are  nicely  described  in  (Michalski  et  al.,  1983).  They  include 
dropping  conditions,  changing  constants  to  variables,  generalizing  by  internal  disjunction,  tree  climbing, 
interval  closing,  exception  introduction,  etc. 


—  Method  5:  Modify  the  strengths  or  CFs  of  offending  rules. 

This  approach  may  be  sufficient  for  correcting  wrong  and  missing  rules.  However, 
it  is  flawed  from  a  theoretical  point  of  view,  with  respect  to  its  sufficiency  for  correcting 
problems  resulting  from  the  global  behavior  of  rules  over  a  set  of  cases.  It  possesses  two 
serious  methodological  problems.  First,  using  all  five  of  these  methods  is  not  necessarily 
appropriate  for  dealing  with  global  deleterious  interactions.  In  section  2  we  explained 
why  in  some  situations  modifying  the  offending  rule  or  adding  counteracting  rules  leads  to 
problems,  and  misses  the  point  of  having  probabilistic  rules,  and  this  eliminates  methods  1, 
2  and  4.  If  rules  are  being  induced  from  a  representative  set  of  training  cases,  modifying  the 
strength  of  the  rule  is  illegal,  since  the  strength  of  the  rule  has  a  probabilistic  interpretation, 
being  derived  from  frequency  information  derived  from  the  training  instances,  and  this 
eliminates  method  5.  Only  method  3  is  left  to  cope  with  deleterious  interactions.  The 
second  methodological  problem  is  that  the  traditional  method  picks  an  arbitrary  case  to 
run  in  its  search  for  misdiagnoses.  Such  a  procedure  will  often  not  converge  to  a  good 
rule  set,  even  if  modifications  are  restricted  to  rule  deletion.  The  example  in  section  5.2 
illustrates  this  situation. 

Our  perspective  on  this  topic  evolved  in  the  course  of  experiments  in  induction  and 
refinement  of  knowledge  bases.  Using  "better"  induction  biases  did  not  always  produce 
rule  sets  with  better  performance,  and  this  prompted  investigating  the  possibility  of  global 
probabilistic  interactions.  Our  original  approach  to  debugging  was  similar  to  the  Teiresias 
approach.  Often,  correcting  a  problem  led  to  other  cases  being  misdiagnosed,  and  in  fact  this 
type  of  automated  incremental  debugging  seldom  converged  to  an  acceptable  set  of  rules.  It 
might  have  if  we  we  engaged  in  the  common  practice  of  "tweaking"  the  CF  strengths  of  rules. 
However  this  was  not  permissible,  since  our  CF  values  were  derived  from  a  representative 
set  of  training  cases,  and  have  a  precise  probabilistic  interpretation, 


4      Minimizing  Sociopathic  Interactions 

Assume  there  exists  a  large  set  of  training  instances,  and  a  rule  set  for  solving  these  instances 
has  been  induced  that  is  fairly  complete  and  contains  rules  that  are  individually  judged  to  be 
good.  By  good,  we  mean  that  they  individually  meet  some  predefined  quality  standards  such 
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as  the  biases  described  in  section  3.1.  Further,  assume  that  the  rule  set  misdiagnoses  some 
of  the  instances  in  the  training  set.  Given  such  an  initial  rule  set,  the  problem  is  to  find  a 
rule  set  that  meets  some  optimality  criteria,  such  as  to  minimize  the  number  of  misdiagnoses 
without  violating  the  goodness  constraints  on  individual  rules.  Now  modifications  to  rules, 
except  for  rule  deletion,  generally  break  the  predefined  goodness  constraints.  And  adding 
other  rules  is  not  desirable,  for  if  they  satisfied  the  goodness  constraints  they  would  have 
been  in  the  original  rule  set  produced  by  the  induction  program.  Hence,  if  we  are  to  find  a 
solution  that  meets  the  described  constraints,  the  solution  must  be  a  subset  of  the  original 
rule  set.3  More  formally: 

Definition  1  (Sociopathic  Knowledge  Base)  A  knowledge  base  is  sociopathic  if  and 
only  if  (l)  all  the  tuples  in  the  knowledge  base  are  individually  judged  to  be  good;  and  (2) 
a  subset  of  the  knowledge  base  gives  better  performance  than  the  original  knowledge  base 
independent  of  the  amount  of  available  computational  resources. 

By  the  definition  of  a  sociopathic  knowledge  base,  the  best  rule  set  is  viewed  as  the 
element  of  the  power  set  of  rules  in  the  initial  rule  set  that  yields  a  global  minimum  weighted 
error.  A  straightforward  approach  is  to  examine  and  compare  all  subsets  of  the  rule  set. 
However,  the  power  set  is  almost  always  too  large  to  work  with,  especiaDy  when  the  initial 
set  has  deliberately  been  generously  generated.  The  selection  process  can  be  modeled  as  a 
bipartite  graph  minimization  problem  as  follows. 

4.1      Bipartite  graph  minimization  formulation 

A  bipartite  graph  G  —  (V,  E)  is  a  graph  whose  vertices  V  can  be  partitioned  into  two  sets 
V\  and  V2  so  that  every  edge  in  E  joins  a  vertex  in  V\  to  a  vertex  in  Vi.  For  each  hypothesis 
in  the  set  of  training  instances,  define  a  directed  bipartite  graph  G  =  (V,  E),  with  its 
vertices  V  partitioned  into  two  sets  J  and  R,  as  shown  in  Figure  1.  Elements  of  R  represent 
rules,  and  the  evidential  strength  of  Rj  is  denoted  by  CFj.  Each  vertex  in  I  represents  a 
training  instance;  for  positive  instances  M;  is  1,  and  for  negative  instances  M;  is  —1.  Arcs 
[Rj,  Ii]  connect  a  rule  in  R  with  the  training  instances  in  I  for  which  its  preconditions  are 


If  we  discover  that  this  solution  is  inadequate  for  our  needs,  then  introducing  rules  that  violate  the 
induction  biases  is  justifiable. 


satisfied;  the  weight  of  arc  [it,-,/,-]  is  CFj.  The  weighted  arcs  terminating  in  a  vertex  in  / 
are  combined  using  an  evidence  combination  function  F,  which  is  denned  by  the  user.  The 
combined  evidence  classifies  an  instance  as  a  positive  instance  if  the  combined  evidence  is 
above  a  user  specified  threshold  CFt.  In  the  example  in  section  5.2,  CFt  is  0,  while  for 
Mycin,  CFt  is  0.2. 


Instance  Set 


Rule  Set 


Ii  (Mi) 


•  Ri  (CFi) 


I2  (Ma)  ♦- 


R2  (CF2) 


Im(Mm)  • 


•  Rn  (CFn) 


Figure  1:  Bipartite  Graph  Formulation.  The  left  hand  nodes,  Jj,. .  . ,  Jm 
represent  a  case  Library  of  m  training  instances,  where  Mi  indicates 
whether  an  instance  is  a  positive  or  negative  example  of  a  hypothesis. 
The  right  hand  nodes,  R\%..  .,Rn  represent  a  knowledge  base  of  prob- 
abilistic rules,  where  CFj  is  the  strength  of  the  rule.  The  links  show 
which  training  instances  7i,...,/m  satisfy  the  preconditions  of  rule  Rj. 

More  formally,  assume  that  Jjf . .., Im  —  training  set  of  instances,  and  i?l5 ..., Rn 
rules  of  an  initial  rule  set.  Then  we  want  to  minimize: 


subject  to  the  constraints 


e;  =  < 


z   =    J^e; 


t=i 


0     if  F{ailru  ...,ainrn)  >  CFt      for  M{  =  1 

0  if  F(an  rl7 ...,  ainrn )<CFt      for  M,  =  - 1 

1  otherwise 


(2) 


(3) 
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Y^T3     Z     Rmin  (4) 


where 


b{  =  bias  constant  to  preferentially  favor  instances; 

Tj  —   if  Rj  is  in  solution  rule  set  then  1  else  0; 

Oij  =    if  arc  [Rj,  I{]  exists  then  CFj  else  0; 

CFt  =    the  CF  threshold  for  positive  classification; 

F  —  n-ary  function  for  combining  CFs,  where 
the  time  to  evaluate  is  polynomial  in  n; 

Rmin  =  minimum  number  of  rules  in  solution  set; 

The  problem  is  to  find  a  subset  of  R  such  that  the  global  weighted  error  z  is  minimum. 
That  is,  the  solution  formulation  solves  for  tj\  if  rj  =  1  then  rule  Rj  is  in  the  final  rule 
set.  The  main  tasks  of  the  user  are  to  specify  the  evidence  combination  function  F  and 
to  set  up  the  a-ij  matrix,  which  associates  rules  and  instances  and  indicates  the  strength  of 
the  the  associations.  Note  that  the  value  of  a^-  is  zero  if  the  preconditions  of  Rj  are  not 
satisfied  in  instance  I{.  Preference  can  be  given  to  particular  instances  via  the  bias  6;  in  the 
objective  function  z.  For  instance,  the  user  may  wish  to  favor  the  selection  of  rules  that 
will  not  misdiagnose  certain  instances  by  setting  the  corresponding  b{  to  a  very  high  value. 
The  Rmin  constraint  forces  the  solution  rule  set  to  be  above  a  minimum  size.  This  prevents 
finding  a  solution  that  is  too  specialized  for  the  training  set,  giving  good  accuracy  on  the 
training  set  but  having  a  high  variance  on  other  sets,  which  would  lead  to  poor  performance. 


Theorem  1    The  bipartite  graph  minimization  problem  for  heuristic  rule  set  optimization 
is  NP-hard. 


Proof:  To  show  that  the  bipartite  graph  minimization  problem  (BGMP)  is  NP-hard,  we 
shall  reduce  Satisfiability  problem  (SAT)  to  it.  The  major  difficulty  is  that  we  have  to  use 
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numerical  combination  functions  to  determine  logical  truth  values  of  clauses.  Assume  there 
are  /  boolean  variables  Ai,...,A/  and  k  clauses  Cj,  Ci, ...,  Cfc,  where  Ci  is  a  disjunction  of 
some  literals.  For  example,  C\    =    A\  V  A3  V  A\. 

1.  Input  transformation:  SAT  clauses  are  mapped  into  graph  instance  nodes  and  the 
atoms  of  the  clauses  are  mapped  into  rule  nodes.  Arcs  connect  rule  nodes  to  instance  nodes 
when  the  respective  literals  appear  in  the  respective  clauses.  Let  m  =  k  and  n  =  I.  Let 
each  clause  represent  a  positive  instance,  then  set  Mi  =  1  for  1  <  i  <  m.  Let  CFj  to  be  1 
for  j  —  l,2,...,n.  For  each  instance  node  7;  (corresponding  to  Ci),  define  the  combination 
function  as  follows: 

n 

F(aiir1,...,ainrn)   =    1  -  Y[(l  -  g(aijrj))  (5) 


where 


gioijTj)  =  < 


aijrj  if  Aj  appears  in  Ci 

1  —  a^rj    if  Aj  appears  in  Ct-  (6) 

0  otherwise 


Note  that  atJ-  =  CFj  =  1  if  either  Aj  or  Aj  appears  in  Ci.  Thus  the  g{aijrj)  function  can 

be  simplified  to: 

if  Aj  appears  in  Ci 
g  (a^rj)  =  I    1  —  tj     if  A  j  appears  in  Ct-  (7) 

0  otherwise 

Since  every  clause  is  of  the  same  importance,  let  bi  =  1  for  all  i,  for  the  objective 
function  2.  Let  i?min  =  0  to  make  its  associate  constraint  trivially  true.  Finally,  choose 
CFt  to  be  0. 

2.  Output  transformation:  The  output  transformation  is  that  (1)  if  Rj  remains  in 
the  final  rule  set,  Aj  is  assigned  to  be  true;  otherwise,  it  is  assigned  to  be  false;  (2)  SAT  is 
satisfied  only  if  z  =  0,  i.e.,  all  the  instances  are  correctly  classified. 

3.  Justification:  First,  it  is  clear  that  the  input  and  output  transformations  can  be 
performed  in  polynomial  time.  Second,  we  will  show  that  C,-  is  satisfied  iff  the  corresponding 
Ii  is  correctly  classified  in  the  final  rule  set,  i.e.,  e,-  =  0.  To  help  understand  the  functionality 
of  g(aijrj),  let  us  rewrite  it  as  follows: 
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g(oijr,) 


1    if  Aj  appears  in  C{  and  tj  =  1,  or 

if  Aj  appears  in  C;  and  tj  =  0  (8) 

0    otherwise 


If  part:  Assume  that  e,-  =  0,  i.e.,  F(anri,  ...,a;nrn)  >  0  (F  must  be  l),  then  at  least 
one  g(aijTj)  is  1.  By  the  definition  of  g{ciijTj)  above,  either  Aj  appears  in  C;  and  Tj  —  1  or 
Aj  appears  in  C{  and  Tj  —  0.  In  either  case,  according  to  the  output  transformation,  the 
corresponding  clause  C{  is  satisfied  (true). 

Only  if  part:  Assume  that  C{  is  satisfied  by  the  truth  assignment  in  the  final  rule 
set.  Then  there  must  exist  some  atom  Aj  such  that  either  Aj  is  in  C{  and  it  is  assigned  to 
be  true  or  Aj  is  in  C{  and  assigned  to  be  false.  In  either  case,  g{a.ijTj)  —  1,  by  the  output 
transformation  and  the  definition  of  the  function.  Therefore,  F(a,ir1,  ...,a,nrn)  =  1  and 
e<  =  0. 

To  summarize,  g(aijrj)  being  1  corresponds  intuitively  to  the  positive  contribution 
made  by  Aj  to  C{. 

Finally,  it's  shown  that  SAT  is  satisfiable  iff  BGMP  so  constructed  has  a  minimum 
objective  value  0.  If  BGMP  has  a  solution  with  z  =  0,  then  et-  =  0  for  all  i,  because 
6,  =  1.  Therefore  each  C{  is  satisfied  and  thus  SAT  is  satisfiable.  Conversely,  if  the  SAT 
is  satisfiable  then  each  C{  can  be  satisfied  by  some  truth  assignment  of  atoms.  Clearly, 
the  final  ride  set  of  the  BGMP  formulation  (of  SAT)  can  be  easily  constructed  with  z  —  0, 
according  to  that  assignment.  □ 

Corollary  1  Given  a  positive  real  number  B ,  the  problem  of  determining  if  there  exists 
a  rule  set  whose  global  weighted  error  z  is  less  than  or  equal  to  B  in  the  bipartite  graph 
formulation  for  heuristic  rule  set  optimization  is  NP-complete. 

Proof:  To  show  that  this  decision  problem  is  in  NP,  we  notice  that  it  is  easy  to  construct 
a  polynomial  algorithm  for  checking  whether  or  not  the  (weighted)  number  of  misdiagnosis 
by  any  given  subset  of  R  is  less  than  or  equal  to  B.  It  is  NP-hard  by  an  argument  similar 
to  that  in  the  proof  of  the  above  theorem.   □ 
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5      Sociopathic  Reduction  Algorithm 

In  this  section,  a  heuristic  method  called  the  Sociopathic  Reduction  Algorithm  is  described, 
and  an  example  is  provided  based  on  the  graph  shown  in  Table  1. 

5.1      The  Sociopathic  Reduction  Algorithm 

The  following  heuristic  hill-climbing  search  method,  the  Sociopathic  Reduction  Algorithm, 
is  one  that  we  have  developed  and  used  in  our  experiments: 


•  Step  1.  Assign  values  to  penalty  constants.  Let  p\  be  the  penalty  assigned  to  a 
poison  rule.  A  poison  rule  is  a  strong  rule  giving  erroneous  evidence  for  a  case  that 
cannot  be  counteracted  by  the  combined  weight  of  all  the  rules  in  the  rule  base  that 
give  correct  evidence.  Let  p?  be  the  penalty  for  contributing  false  positive  evidence 
to  a  misdiagnosed  case,  p$  be  the  penalty  for  contributing  false  negative  evidence  to 
a  misdiagnosed  case,  p\  be  the  penalty  for  contributing  false  positive  evidence  to  a 
correctly  diagnosed  case,  p$  be  the  penalty  for  contributing  false  negative  evidence 
to  a  correctly  diagnosed  case,  and  p$  be  the  penalty  for  using  weak  rules.  Let  h  be 
the  maximum  number  of  rules  that  are  removed  at  each  iteration.  Let  i?mtn  be  the 
minimum  size  of  the  solution  rule  set. 

•  Step  2.  Optional  step  for  very  large  rule  sets:  given  an  initial  rule  set,  create  a  new 
rule  set  containing  the  n  strongest  rules  for  each  case. 

•  Step  3.  Find  all  misdiagnosed  cases  for  the  rule  set.  If  none  exists,  stop.  Otherwise, 
collect  and  rank  the  rules  that  contribute  evidence  toward  these  erroneous  diagnoses. 
The  rank  of  rule  Rj  is  Ya=i  Pinij->  where: 

—  riij  =  1  if  Rj  is  a  poison  rule  or  its  deletion  leads  to  the  creation  of  another 
poison  rule  and  0  otherwise. 

—  n,2j  =  the  number  of  misdiagnoses  for  which  Rj  gives  false  positive  evidence; 

—  n3j  —  the  number  of  misdiagnoses  for  which  Rj  gives  false  negative  evidence; 

—  n4j-  =  the  number  of  correct  diagnoses  for  which  Rj  gives  false  positive  evidence; 
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—  n5j-  =  the  number  of  correct  diagnoses  for  which  Rj  gives  false  negative  evidence; 

—  riQj  =  the  absolute  value  of  the  CF  of  Rj\ 

•  Step  4.  Eliminate  the  h  highest  ranking  rules. 

•  Step  5.  If  the  number  of  misdiagnoses  is  decreased,  go  to  step  3. 

•  Step  6.  Else,  if  the  number  of  misdiagnoses  begins  to  increase  and  h  ^  1,  then 

—  Undo  the  last  deletion,  i.e.,  take  back  the  most  recently  removed  h  rules.4 

—  hi-  h-l.s 

—  Goto  step  3. 

•  Step  7.    Otherwise,  i.e.,  if  the  number  of  misdiagnoses  is  increased  and  h  =  1,  then 
undo  the  last  rule  deletion;  output  the  final  rule  set  and  stop. 

Each  iteration  of  the  algorithm  produces  a  new  rule  set,  and  each  rule  set  must  be 
rerun  on  all  training  instances  to  locate  the  new  set  of  misdiagnosed  instances.  If  this  is  par- 
ticularly difficult  to  do,  the  h  parameter  in  step  4  can  be  increased,  but  there  is  the  potential 
risk  of  converging  to  a  suboptimal  solution.  For  each  misdiagnosed  instance,  the  automated 
reasoning  system  that  uses  the  rule  set  must  be  able  to  explain  which  rules  contributed  to 
a  misdiagnosis.  Hence,  we  require  a  system  with  good  explanation  capabilities. 

The  nature  of  an  optimal  rule  set  differs  between  domains.  Penalty  constants,  pi, 
are  the  means  by  which  the  user  can  define  an  optimal  policy.  For  instance,  via  p2  ai*d 
j>3,  the  user  can  favor  false  positive  over  false  negative  misdiagnoses,  or  visa  versa.  For 
medical  expert  systems,  a  false  negative  is  often  more  damaging  than  a  false  positive,  as 
false  positives  generated  by  a  medical  program  can  often  be  caught  by  a  physician  upon 
further  testing.  False  negatives,  however,  may  be  sent  home,  never  to  be  seen  again. 

In  our  experiments,  the  value  of  the  six  penalty  constants  was  p,  =  106-1.  The  h 
constant  determines  how  many  rules  are  removed  on  each  iteration,  and  its  value  is  about 
5.  Rmin  is  the  minimum  size  of  the  solution  rule  set,  usually  about  90%  of  the  original  set; 
its  usefulness  was  described  in  section  4.1. 


4  It  is  this  step  that  makes  it  a  hill-climbing  algorithm. 
Since  the  h  is  usually  small,  say  about  5,  the  next  incremental  step  of  1  is  the  simplest,  although  the 
more  complicated  schema  of  step  decrements  can  be  implemented  for  a  relatively  big  number  of  h. 
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I\R 

#i(+.33)*      R2{+.75)     i23(+-33) 

J24(-.33)*      Rs(-.75)c     i26(-.33) 

Io(+) 

X 

ii(+) 

X                         X 

Ji(+) 

XXX 

X 

h(+) 

X                        X 

X                                                      X 

h(+r 

X                       X 

X                           X 

h(-r 

X                          X 

X 

h(-y 

X                                                  X 

X                                                    X 

H-) 

X                         X 

'•(-) 

X 

XXX 

h(-) 

X                          X 

Table  1:  An  example  for  Sociopathic  Reduction  algorithm.  There  are 
ten  training  instances  that  are  classified  as  positive  (  +  )  or  negative  (  — ) 
instances  of  the  hypothesis.  There  are  six  rules  shown  with  their  CF 
strength.  The  marks  indicate  the  instances  to  which  the  rules  apply,  i.e., 
when  an  instance  satisfies  the  premises  clauses  of  a  rule. 

5.2      Example  of  sociopathic  reduction 


In  this  example,  which  is  illustrated  in  Table  5.1,  there  are  ten  training  instances  I0, . . .,  J9, 
classified  as  positive  or  negative  instances  of  the  hypothesis.  There  are  six  rules  JRj, . . . ,  R6 
shown  with  their  CF  strength.  The  marks  (x)  indicate  the  instances  to  which  the  rules 
apply,  i.e.,  when  an  instance  satisfies  the  premises  clauses  of  a  rule.  To  simplify  the  example, 
define  the  combined  evidence  for  an  instance  as  the  sum  of  the  evidence  contributed  by  all 
applicable  rules,  and  let  CFt  =  0.  Rules  with  a  CF  of  one  sign  that  are  connected  to  an 
instance  of  the  other  sign  contribute  erroneous  evidence.  Two  cases  in  the  example  are 
misdiagnosed:  I4  and  1$.  The  objective  is  to  find  a  subset  of  the  rule  set  that  minimizes  the 
number  of  misdiagnoses.  Before  the  details  are  examined,  the  following  points  concerning 
examples  should  be  made. 

First,  it  can  be  shown  that  it  is  impossible  to  have  an  example  using  rules  with  out 
degree  less  than  5  that  has  all  the  points  to  be  made  from  this  example,  if  there  are  the  equal 
number  of  positive  and  negative  training  instances.  The  argument  is  trivial  for  the  rules 
with  out  degree  of  1  and  2.  For  a  rule  with  out  degree  of  3,  assume  that  it  has  a  positive  CF 
value  and  is  to  be  deleted.  Then,  it  must  misdiagnose  some  negative  instance  to  become  a 


16 


rule  to  be  blamed.  And,  in  order  to  have  a  positive  CF,  it  must  provide  (positive)  evidence 
for  two  positive  instances,  provided  that  the  number  of  positive  instances  is  equal  to  that 
of  negative  instances.  Therefore,  the  number  of  correct  diagnoses  for  which  it  gives  false 
positive  evidence  must  be  zero,  since  the  only  negative  instance  that  it  connects  to  is  the 
misdiagnosed  one.  Then,  its  ranking  vector  is  (nij,  ri2j,  n3j,  n4j-,  n5j-,  n&j)  —  (0,1,  0, 0, 0,  CF) 
which  results  in  the  smallest  ranking  quantity  that  a  blamed  rule  with  positive  CF  can  have. 
Thus,  the  algorithm  will  not  guarantee  to  chose  it  for  deletion.  The  argument  for  rules  with 
out  degree  of  4  is  similar  to  the  above,  or  the  CF  values  are  zeroes  if  the  rules  connect 
to  two  positive  instances  and  two  negative  ones.  It  may  be  possible  to  devise  a  heuristic 
algorithm  which  gives  a  better  computational  performance  from  this  observation. 

The  second  point  to  make  is  that  the  CF  values  attached  to  the  rules  are  the  real 
values  that  are  calculated  based  on  the  formula  given  in  the  appendix.  Take  J?i(  +  -33)  for 
example. 


x\  :  E  true  among  positive  instances  =  3/5 
Z2  :  E  true  among  negative  instances  =  2/5 
xz  :     H  true  among  all  instances  =     5/10 


(9) 


Then, 


x4  = 


x\x3 


X1X3  +  x2{l  -x3) 


=  0.60 


(10) 


Since  2:4  >  2:3, 


CF 


z4  -  x3 


a:4(l-Z3)       3 


-  =  0.33 


(11) 


Now  the  examination  of  the  example  is  to  be  preceded.  Assume  that  the  final  rule 
set  must  have  at  least  four  rules,  hence  i2mtn  =  4.  Let  p,  =  106-',  for  0  <  i  <  5,  thus 
choosing  rules  in  the  highest  category,  and  using  lower  categories  to  break  ties. 

On  the  first  iteration,  two  misdiagnosed  instances  are  found,  74  and  7s,  and  four  rules 
contribute  erroneous  evidence  toward  these  misdiagnoses,  i?i,  R2,  #4,  and  R5.  Their  ranking 
vectors  are  shown  in  Table  2.  Clearly,  R\  has  the  highest  ranking  quantity  £f=1  ptn,j,  thus 
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nXj 

n2j 

n3j 

n4j 

n5j 

«6j 

Rx 

0 

1 

0 

1 

0 

0.33 

R2 

0 

1 

0 

0 

0 

0.75 

i?4 

0 

0 

1 

0 

1 

0.33 

Rs 

0 

0 

1 

0 

0 

0.75 

Table  2:  The  ranking  vectors  of  blamed  rules 

it  is  chosen  for  deletion.  On  the  second  iteration,  one  misdiagnosis  is  found,  74,  and  two 
erroneous  rules  contribute  erroneous  evidence,  R4  and  iZ5.  Rules  are  ranked  and  iZ4  is 
deleted.  This  reduces  the  number  of  misdiagnoses  to  zero  and  the  algorithm  successfully 
terminates. 

The  same  example  can  be  used  to  illustrate  the  problem  of  the  traditional  method  of 
rule  set  debugging,  where  the  order  in  which  cases  are  checked  for  misdiagnoses  influences 
which  rules  are  deleted.  Consider  a  Teiresias  style  program  that  looks  at  training  instances 
and  discovers  I4  is  misdiagnosed.  There  are  two  rules  that  contribute  erroneous  evidence  to 
this  misdiagnosis,  rules  R4  and  i?5.  It  wisely  notices  that  deleting  R4  causes  I6  to  become 
misdiagnosed,  hence  increasing  the  number  of  misdiagnoses;  so  it  chooses  to  delete  i?5. 
However,  no  matter  which  rule  it  now  deletes,  there  will  always  be  at  least  one  misdiagnosed 
case.  To  its  credit,  it  reduced  the  number  of  misdiagnoses  from  two  to  one;  however,  it  fails 
to  converge  to  an  rule  set  that  minimizes  the  number  of  misdiagnoses. 

5.3      Experience  with  the  Sociopathic  Reduction  Algorithm 

Some  preliminary  experiment  with  the  Sociopathic  Reduction  Algorithm  has  been  com- 
pleted, using  the  Mycin  case  library  which  is  a  collection  of  112  solved  cases  that  were 
obtained  from  records  at  the  Stanford  Medical  Hospital.  The  rule  set  of  about  370  rules 
was  the  one  after  (1)  correcting  an  incorrect  domain  theory,  and  (2)  using  apprenticeship 
learning  to  extend  an  incomplete  domain  theory  (Wilkins  and  Tan,  1989).  The  Sociopathic 
Reduction  Algorithm  removed  21  rules  from  the  knowledge  base  after  8  iterations.  In  Table 
3,  it  is  shown  that  about  10%  improvement  over  the  knowledge  base  tested  is  obtained. 

Although  our  work  is  pretty  much  theoretical  research  oriented  one  example  of  ex- 
periments is  not  sufficient  by  any  means.  Thus,  our  ongoing  experiments  involve  two  kinds 
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Disease 

Number 

Before  Reduction 

After  Reduction 

Cases 

TP 

FN 

FP 

TP 

FN 

FP 

Bacterial  Meningitis 

16 

14 

2 

13 

12 

4 

4 

Brain  Abscess 

7 

1 

6 

0 

1 

6 

0 

Cluster  Headache 

10 

8 

2 

0 

8 

2 

0 

Fungal  Meningitis 

8 

3 

5 

0 

4 

4 

0 

Migraine 

10 

6 

4 

0 

7 

3 

0 

Myco-TB  Meningitis 

4 

4 

0 

1 

4 

0 

3 

Primary  Brain  Tumor 

16 

3 

13 

0 

10 

6 

1 

Subarach  Hemorrhage 

21 

16 

5 

3 

16 

5 

4 

Tension  Headache 

9 

8 

1 

3 

8 

1 

1 

Viral  Meningitis 

11 

10 

1 

12 

10 

1 

6 

None 

0 

0 

0 

7 

0 

0 

12 

Totals 

112 

73 

39 

39 

80 

32 

32 

Table  3:    The  Sociopathic  Reduction  Algorithm,  when  applied  to  this 
knowledge  base,  improves  the  performance  by  about  10%. 

of  tests.  First,  we  divide  the  cases  into  a  training  set  and  a  validation  set  with  70%  vs. 
30%  each,  so  that  it  can  be  shown  that  the  performance  improvement  is  carried  over  to  the 
validation  set.  To  be  more  accurate,  we  would  like  to  randomly  split  the  cases  five  times 
and  then  average  the  improvements.  Second,  we  like  to  apply  the  method  just  described  to 
various  knowledge  bases  available,  for  example,  a  knowledge  base  after  correction  of  wrong 
rules  only,  a  knowledge  base  after  case-based  learning  application,  and  so  on. 
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6      Related  Work 

The  original  contribution  of  this  paper  is  to  show  that  correct  knowledge  can  be  harmful 
independent  of  problem-solving  efficiency  and  that  this  problem  is  widespread.  Another 
contribution  is  to  show  that  the  problem  of  harmful  knowledge  can  be  minimized  and 
problem-solving  performance  improved  by  a  particular  form  of  knowledge  base  reduction, 
and  that  the  optimal  reduction  is  NP-hard. 

The  theme  of  correct  knowledge  being  harmful  has  been  studied  by  a  number  of  other 
investigators.  Minton  has  investigated  how  the  learning  of  correct  search  control  knowledge 
can  slow  down  a  problem  solver;  his  solution  approach  is  to  quantify  the  potential  utility 
of  a  new  piece  of  control  knowledge  and  only  add  those  with  a  high  utility  (Minton  and 
Carbonell,  1987).  Markovitch  and  Scott  have  shown  that  any  deductively  learned  knowledge 
effects  the  cost  of  searching  a  problem  space;  their  solution  approach  is  to  use  filter  functions 
that  can  determine  whether  a  piece  of  past  knowledge  that  has  been  deductively  learned 
should  be  used  on  a  current  problem  (Markovitch  and  Scott,  1989).  Still  another  approach 
is  to  modify  learned  search  control  knowledge  to  increase  problem- solving  speed  (Prieditis 
and  Mostov,  1987). 

The  theme  of  improving  problem- solving  accuracy  via  knowledge  base  reduction  has 
been  studied  in  conjunction  with  eliminating  or  reducing  wrong  knowledge.  For  example, 
the  genetic  algorithm  used  in  conjunction  with  a  classifier  system  eliminates  as  much  as  half 
of  a  knowledge  base;  it  ehminates  rules  that  has  not  contributed  to  past  problem-solving 
successes  (Holland,  1986).  Another  approach  is  to  perform  a  global  analysis  of  a  knowledge 
base  and  eliminate  those  rules  that  are  redundant  or  inconsistent  (Ginsberg  et  al.,  1988). 

Learning  systems  that  perform  induction  from  noisy  training  instances  have  also 
addressed  the  problem  of  wrong  knowledge.  The  RULEMOD  program  of  META-DENDRAL 
selects  a  subset  of  rules  that  have  wide  applicability,  thereby  reducing  the  number  of  wrong 
rules  (Buchanan  and  Mitchell,  1978).  RULEMOD  also  selects  rules  that  jointly  form  a 
good  global  cover  and  hence  shares  our  concern  for  finding  rules  that  work  well  together. 
The  TRUNC  program  of  AQ15  deletes  those  disjunctions  of  non-probabilistic  induced  rules 
that  cover  the  fewest  cases  (Michalski  et  al.,  1986a;  Michalski  et  al.,  1986b).  The  reduced 
knowledge  bases  produced  by  RULEMOD  and  TRUNC  give  equal  or  superior  performance. 
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7      Summary  and  Conclusion 

Traditional  methods  of  debugging  a  probabilistic  rule  set  are  suited  to  handling  missing 
or  wrong  rules,  but  not  to  handling  deleterious  interactions  between  good  rules.  This 
paper  describes  the  underlying  reason  for  this  phenomenon.  We  formulated  the  problem 
of  minimizing  deleterious  rule  interactions  as  a  bipartite  graph  minimization  problem  and 
proved  that  it  is  NP-hard.  A  heuristic  method  was  described  for  solving  the  graph  problem, 
called  the  Sociopathic  Reduction  Algorithm.  In  our  experiments,  the  Sociopathic  Reduction 
Algorithm  gave  good  results. 

We  believe  that  the  rule  set  refinement  method  described  in  this  paper,  or  its  equiv- 
alent, is  an  important  component  of  any  learning  system  for  automatic  creation  of  proba- 
bilistic rule  sets  for  automated  reasoning  systems.  All  such  learning  systems  will  confront 
the  problem  of  deleterious  interactions  among  good  rules,  and  the  problem  wiU  require  a 
global  solution  method,  such  as  we  have  described  here. 

Our  future  research  in  this  area  is  to  create  a  theory  of  sociopathicity  that  subsumes 
all  AI  techniques  for  uncertainty  reasoning,  including  certainty  factors,  Bayesian  methods, 
probability  methods,  Dempster- Shafer  theory,  fuzzy  reasoning,  belief  networks,  and  non- 
monotonic reasoning.  For  our  progress  to  date,  see  (Ma  and  Wilkins,  1990a;  Ma  and 
Willdns,  1990b;  Ma  and  Wilkins,  1990c). 
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Appendix  1:  Calculating  G. 

Consider  rules  of  the  form  conclude(H,  CF)  :-  E.  Then  CF  =  G  —  G{x\,x-i,xz)  —  empirical 
predictive  power  of  rule  R,  where: 

•  x\  —  P(E+\H+)  —  fraction  of  the  positive  instances  in  which  R  correctly  succeeds 
(true  positives  or  false  negatives) 

•  x2  —  P(E+\H~)  —  fraction  of  the  negative  instances  in  which  R  incorrectly  succeeds 
(false  positives  or  true  negatives) 

•  xz  =  P(H+)  =  fraction  of  all  instances  that  are  positive  instances 
Given  xi,Z2>  ^3>  let 

■  «.  =  f(g+i*+)  =  .,.,r.,7.-,,)- 

If  xt  >  x,  then  G  =  ^f^  else  G  =  jf^. 

This  probabilistic  interpretation  reflects  to  the  modifications  to  the  certainly  factor 
model  proposed  by  (Heckerman,  1986). 
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